AITopics

Country:

North America > United States > Virginia (0.04)
North America > Canada (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsDec-24-2025, 08:22:24 GMT

Self-Learning Transformations for Improving Gaze and Head Redirection

Many computer vision tasks rely on labeled data. Rapid progress in generative modeling has led to the ability to synthesize photorealistic images. However, controlling specific aspects of the generation process such that the data can be used for supervision of downstream tasks remains challenging. In this paper we propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles. This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc. We propose a novel architecture which learns to discover, disentangle and encode these extraneous variations in a self-learned manner. We further show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation. A novel evaluation scheme shows that our method improves upon the state-of-the-art in redirection accuracy and disentanglement between gaze direction and head orientation changes. Furthermore, we show that in the presence of limited amounts of real-world training data, our method allows for improvements in the downstream task of semi-supervised cross-dataset gaze estimation.

gaze and head redirection, name change, self-learning transformation, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

arXiv.org Artificial IntelligenceOct-8-2025

mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies

Steiner, Remo, Millane, Alexander, Tingdahl, David, Volk, Clemens, Ramasamy, Vikram, Yao, Xinjie, Du, Peter, Pouya, Soha, Sheng, Shiwei

End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot's field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches without memory mechanisms struggle. We release our reconstruction system, training code, and evaluation tasks to spur research in this direction.

artificial intelligence, machine learning, reconstruction, (17 more...)

2509.20297

Country: Europe > Switzerland (0.28)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsAug-15-2025, 07:21:15 GMT

98f2d76d4d9caf408180b5abfa83ae87-Paper.pdf

architecture, extraneous factor, head orientation, (14 more...)

Country:

North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Cui, Xue, Zakka, Vincent Gbouna, Lee, Minhyun

A computer vision-based model for occupancy detection using low-resolution thermal images

arXiv.org Artificial IntelligenceMay-14-2025

Occupancy plays an essential role in influencing the energy consumption and operation of heating, ventilation, and air conditioning (HVAC) systems. Traditional HVAC typically operate on fixed schedules without considering occupancy. Advanced occupant-centric control (OCC) adopted occupancy status in regulating HVAC operations. RGB images combined with computer vision (CV) techniques are widely used for occupancy detection, however, the detailed facial and body features they capture raise significant privacy concerns. Low-resolution thermal images offer a non-invasive solution that mitigates privacy issues. The study developed an occupancy detection model utilizing low-resolution thermal images and CV techniques, where transfer learning was applied to fine-tune the You Only Look Once version 5 (YOLOv5) model. The developed model ultimately achieved satisfactory performance, with precision, recall, mAP50, and mAP50 values approaching 1.000. The contributions of this model lie not only in mitigating privacy concerns but also in reducing computing resource demands.

artificial intelligence, machine learning, thermal image, (15 more...)

2505.08336

Country:

Asia > China > Hong Kong (0.05)
Europe > United Kingdom (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (0.50)

Industry:

Construction & Engineering > HVAC (1.00)
Information Technology > Security & Privacy (0.96)
Energy (0.89)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsOct-10-2024, 21:09:56 GMT

Self-Learning Transformations for Improving Gaze and Head Redirection

gaze and head redirection, head orientation, self-learning transformation

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.60)

Hernandez-Cruz, Vanessa, Zhang, Xiaotong, Youcef-Toumi, Kamal

Bayesian Intention for Enhanced Human Robot Collaboration

arXiv.org Artificial IntelligenceSep-30-2024

Predicting human intent is challenging yet essential to achieving seamless Human-Robot Collaboration (HRC). Many existing approaches fail to fully exploit the inherent relationships between objects, tasks, and the human model. Current methods for predicting human intent, such as Gaussian Mixture Models (GMMs) and Conditional Random Fields (CRFs), often lack interpretability due to their failure to account for causal relationships between variables. To address these challenges, in this paper, we developed a novel Bayesian Intention (BI) framework to predict human intent within a multi-modality information framework in HRC scenarios. This framework captures the complexity of intent prediction by modeling the correlations between human behavior conventions and scene data. Our framework leverages these inferred intent predictions to optimize the robot's response in real-time, enabling smoother and more intuitive collaboration. We demonstrate the effectiveness of our approach through a HRC task involving a UR5 robot, highlighting BI's capability for real-time human intent prediction and collision avoidance using a unique dataset we created. Our evaluations show that the multi-modality BI model predicts human intent within 2.69ms, with a 36% increase in precision, a 60% increase in F1 Score, and an 85% increase in accuracy compared to its best baseline method. The results underscore BI's potential to advance real-time human intent prediction and collision avoidance, making a significant contribution to the field of HRC.

intent prediction, orientation, prediction, (14 more...)

2410.00302

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Ledder, Wessel, Qin, Yuzhen, van der Heijden, Kiki

Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments

arXiv.org Artificial IntelligenceSep-16-2024

Although deep reinforcement learning (DRL) approaches in audio signal processing have seen substantial progress in recent years, audio-driven DRL for tasks such as navigation, gaze control and head-orientation control in the context of human-robot interaction have received little attention. Here, we propose an audio-driven DRL framework in which we utilise deep Q-learning to develop an autonomous agent that orients towards a talker in the acoustic environment based on stereo speech recordings. Our results show that the agent learned to perform the task at a near perfect level when trained on speech segments in anechoic environments (that is, without reverberation). The presence of reverberation in naturalistic acoustic environments affected the agent's performance, although the agent still substantially outperformed a baseline, randomly acting agent. Finally, we quantified the degree of generalization of the proposed DRL approach across naturalistic acoustic environments. Our experiments revealed that policies learned by agents trained on medium or high reverb environments generalized to low reverb environments, but policies learned by agents trained on anechoic or low reverb environments did not generalize to medium or high reverb environments. Taken together, this study demonstrates the potential of audio-driven DRL for tasks such as head-orientation control and highlights the need for training strategies that enable robust generalization across environments for real-world audio-driven DRL applications.

acoustic environment, agent, orientation deviation, (13 more...)

2409.10048

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)

Genre: Research Report > New Finding (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Schreiter, Tim, Rudenko, Andrey, Magnusson, Martin, Lilienthal, Achim J.

Human Gaze and Head Rotation during Navigation, Exploration and Object Manipulation in Shared Environments with Robots

arXiv.org Artificial IntelligenceJun-10-2024

Abstract-- The human gaze is an important cue to signal intention, attention, distraction, and the regions of interest in the immediate surroundings. Gaze tracking can transform how robots perceive, understand, and react to people, enabling new modes of robot control, interaction, and collaboration. In this paper, we use gaze tracking data from a rich dataset of human motion (THÖR-MAGNI) to investigate the coordination between gaze direction and head rotation of humans engaged in various indoor activities involving navigation, interaction with objects, and collaboration with a mobile robot. In particular, we study the spread and central bias of fixations in diverse activities and examine the correlation between gaze direction and head rotation. We introduce various human motion metrics to enhance the understanding of gaze behavior in dynamic interactions. Finally, we apply semantic object labeling to decompose the gaze distribution into activity-relevant regions. Robots operating in shared environments with humans can benefit significantly from the ability to track and interpret various cues related to human motion and activity.

interaction, participant, robot, (15 more...)

2406.063

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Sweden > Örebro County > Örebro (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Automobiles & Trucks (0.68)
Transportation (0.46)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.35)

Neural Information Processing SystemsMar-14-2024, 04:08:27 GMT

3D Social Saliency from Head-mounted Cameras

A gaze concurrence is a point in 3D where the gaze directions of two or more people intersect. It is a strong indicator of social saliency because the attention of the participating group is focused on that point. In scenes occupied by large groups of people, multiple concurrences may occur and transition over time. In this paper, we present a method to construct a 3D social saliency field and locate multiple gaze concurrences that occur in a social scene from videos taken by head-mounted cameras. We model the gaze as a cone-shaped distribution emanating from the center of the eyes, capturing the variation of eye-in-head motion. We calibrate the parameters of this distribution by exploiting the fixed relationship between the primary gaze ray and the head-mounted camera pose. The resulting gaze model enables us to build a social saliency field in 3D. We estimate the number and 3D locations of the gaze concurrences via provably convergent modeseeking in the social saliency field. Our algorithm is applied to reconstruct multiple gaze concurrences in several real world scenes and evaluated quantitatively against motion-captured ground truth.

concurrence, head-mounted camera, ray, (16 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Texas (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.94)
Information Technology > Artificial Intelligence > Robots (0.68)
(2 more...)